Policy Evaluation Using the Ω-Return

نویسندگان

  • Philip S. Thomas
  • Scott Niekum
  • Georgios Theocharous
  • George Konidaris
چکیده

We propose the Ω-return as an alternative to the λ-return currently used by the TD(λ) family of algorithms. The benefit of the Ω-return is that it accounts for the correlation of different length returns. Because it is difficult to compute exactly, we suggest one way of approximating the Ω-return. We provide empirical studies that suggest that it is superior to the λ-return and γ-return for a variety of problems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pricing strategy and return policy of one-echelon green supply chain under both green and hybrid productions

In this paper,we investigate the pricing and return policy issueof one-echelon green supply chain, contain a manufacture who produces two type of products: green and non-green products. These products have a same functional but in selling price and environmentally issues have different effects. Also we consider return policy for both products that can stimulate the customer valuation. We develo...

متن کامل

An Economic Evaluation of Iranian Horticultural Research and Extension Policy: The Case Study of Almond Late Flowering Cultivars

This paper examines the economic effects of investment in developing and introducing Almond Late Flowering Cultivars (ALFC) in a period of 52 years from 1968 to 2020, developed in Sahand Horticultural Research Station (SHRS), using the economic surplus model and field survey data. ALFC make almond supply curve move less to the left when there is a chilling case, thus affect the economic surplus...

متن کامل

Prediction the Return Fluctuations with Artificial Neural Networks' Approach

Time changes of return, inefficiency studies performed and presence of effective factors on share return rate are caused development modern and intelligent methods in estimation and evaluation of share return in stock companies. Aim of this research is prediction of return using financial variables with artificial neural network approach. Therefore, the statistical population of this study incl...

متن کامل

Optimizing pricing and ordering strategies in a three-level supply chain under return policy

This paper develops an economic production quantity model in a three-echelon supply chain composing of a supplier, a manufacturer and a wholesaler under two scenarios. As the first scenario, we consider a return contract between the outside supplier and the supplier and also between the manufacturer and the wholesaler, but in the second one, the return policy between the manufacturer and the wh...

متن کامل

The Option-Critic Architecture

Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this problem in the framework of options [Sutton, Precup & Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015